[python] Support VARIANT type in pypaimon#7635
[python] Support VARIANT type in pypaimon#7635chenghuichen wants to merge 7 commits intoapache:masterfrom
Conversation
5b65553 to
4f1e2f4
Compare
|
The PR is ready for review now. |
|
Thanks @chenghuichen , let me check this. |
| # Constants (matching GenericVariantUtil.java) | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| _PRIMITIVE = 0 |
There was a problem hiding this comment.
Duplicated binary constants and helpers across generic_variant.py and variant_shredding.py
Both files define _PRIMITIVE, _SHORT_STR, _OBJECT, _ARRAY, _U8_MAX, _U32_SIZE, _VERSION_MASK, _read_unsigned, _get_int_size, _object_header, _array_header, etc. This is a maintenance risk — if the spec changes, both files need updating. Consider extracting shared constants/helpers into a small _variant_binary.py module.
| return bytes(buf) | ||
|
|
||
|
|
||
| def _extract_overflow_fields(overflow_bytes: bytes) -> List[Tuple[int, bytes]]: |
There was a problem hiding this comment.
The comment explains that data may be laid out in a different order than the id table. The sorting-by-offset logic is correct but intricate. Consider adding a small inline example or diagram showing the case where insertion-order data differs from sorted-id order, to help future maintainers.
Purpose
Background: #7655
This PR adds VARIANT read/write support to pypaimon, with a particular focus on shredded VARIANT.
variant.shreddingSchemais configured on a table, VARIANT columns are written in shredded Parquet format according to the schema.struct<value: binary, metadata: binary>form, transparent to the caller.Shredded column pruning and predicate pushdown will be built on top of this PR.
Tests
pypaimon/tests/variant_test.pyrun_java_variant_write_py_read_testrun_py_variant_write_java_read_test